Automatic Identification of Bengali Noun-Noun Compounds Using Random Forest

نویسندگان

  • Vivekananda Gayen
  • Kamal Sarkar
چکیده

This paper presents a supervised machine learning approach that uses a machine learning algorithm called Random Forest for recognition of Bengali noun-noun compounds as multiword expression (MWE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using Chunk information and various heuristic rules and (2) training the machine learning algorithm to recognize a candidate multi-word expression as Multi-word expression or not. A variety of association measures, syntactic and linguistic clues are used as features for identifying MWEs. The proposed system is tested on a Bengali corpus for identifying noun-noun compound MWEs from the corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Machine Learning Approach for the Identification of Bengali Noun-Noun Compound Multiword Expressions

This paper presents a machine learning approach for identification of Bengali multiword expressions (MWE) which are bigram nominal compounds. Our proposed approach has two steps: (1) candidate extraction using chunk information and various heuristic rules and (2) training the machine learning algorithm called Random Forest to classify the candidates into two groups: bigram nominal compound MWE ...

متن کامل

Identification of Noun-Noun (N-N) Collocations as Multi-Word Expressions in Bengali Corpus

Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Beng...

متن کامل

Noun Compound and Named Entity Recognition and their Usability in Keyphrase Extraction

We investigate how the automatic identification of noun compounds and named entities can contribute to keyphrase extraction and we also show how previously identified noun compounds affect named entity recognition and vice versa, how noun compound detection is supported by identified named entities. Our experiments demonstrate that already known noun compounds yield better performance in named ...

متن کامل

Automatic Interpretation of Noun Compounds Using WordNet Similarity

The paper introduces a method for interpreting novel noun compounds with semantic relations. The method is built around word similarity with pretagged noun compounds, based on WordNet::Similarity. Over 1,088 training instances and 1,081 test instances from the Wall Street Journal in the Penn Treebank, the proposed method was able to correctly classify 53.3% of the test noun compounds. We also i...

متن کامل

Handling Of Prepositions In English To Bengali Machine Translation

The present study focuses on the lexical meanings of prepositions rather than on the thematic meanings because it is intended for use in an English-Bengali machine translation (MT) system, where the meaning of a lexical unit must be preserved in the target language, even though it may take a different syntactic form in the source and target languages. Bengali is the fifth language in the world ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013